План выступления
Introduction We represent a group of linguists from Tyumen, Western Siberia, which is developing a corpus of imperfect translations (which we called Russian Learner Parallel Corpus (Slide 1 – name and place)). This corpus is being compiled of English-Russian and Russian-English translations by students from 4 Russian State Universities. The project is still under development and as of now it includes about 2000 both source texts and translations, the total word-count of the existing version is half a million words and it is still counting (i.e. we keep adding texts to it). The planned total for the Corpus is 10 mln words (Slide 2 – Corpus statistics). Aims and Features The corpus has three important features which in combination make it distinct from the existing similar projects (such as MeLLANGE Learner Translator Corpus): 1) it includes Russian as a working language; 2) most originals have a variety of translations; 3) it is based on translations that are rich in mistakes (Slide 3 – distinctive features). The Corpus is supposed to provide reliable research material to study mistakes in translation on the one hand and also to analyze translational variability. The Corpus Material Examplified A short example 'of the material this Corpus contains is offered in the ''slide 4: Source 1 (from an essay on death penalty): The warrior and the executioner do similar jobs. Both kill the enemies of the state. Translation 1.1. Боец и палач имеют одинаковую миссию. Оба'' убивают врагов народа. Translation 1.2. Воин и палач выполняют схожую работу. Они уничтожают врагов государства. Translation 1.3. Солдаты и палачи в чем-то похожи. И те и другие ''убивают ''врагов государства. Translation 1.4. Воин и палач выполняют одну и ту же работу. Им обоим приходится уничтожать врагов народа. As can be seen from the slide none of these translation corrupt the source text idea but offer various ways of rendering the same original message, but some individual translations carry stylistic inaccuracies that are in italics'', and lexical mistakes that are underlined. The User Interface Based on Modified TMX We have already made this collection '''available via Internet. As of now the corpus is the orderly arranged collection of these texts in plain text format along with the head-files containing the extralinquistic information for each text. Currently we are working on the software to make it easily usable. This slide (Slide 5 – the screenshot) has the screenshot of the user interface. It exists as a working model '''already and offers the researcher a variety of choices. Its major function is '''to give translated versions of the element requested in the field “search”. You can get wider contexts for the sentences if necessary as well as extralinquistic information on each test. The user can narrow his request by a number of parameters such as translator’s gender, experience or affiliation, genre of the source text, the conditions and the year of the translation. What’s under the hood? The program uses a modified TMX-file (available on the slide 6), which contains the totality of our sources and translations aligned against each other at the level of a sentence-long segments. We have developed a script to incorporate additional data into the standard TMX for it to carry the name of the original txt-file and extralinguistic information associated with each segment. Another improvement built into our TMX is that it is able to show several translations for each original segment. This technical solution is different from what is announced in our article initially offered for Dialogue. We planned to use the open-source program that is behind Mona Baker's Translational English Corpus but have found out it is unable to process TMX. Uses We see two basic spheres of application for our Corpus. 1) A scholar can use the Corpus to draw conclusions on 1.1. mistakes in translation and on 1.2. translational variability. For example, within the first area one can research *types of mistakes in rendering English articles; *types of mistakes and their correlation with translator’s extralingistic parameters such as gender or affiliation *types of mistakes that interfere with the text message in translation *comparative study of typical mistakes in both directions of translation. The study of the choice translators make can answer such questions as *whether translators working into their B language (not their mother-tongue) render the stylistic features of the source and how they do it; *what the variability limits for acceptable translations are; *whether there are trends in translating politically correct/incorrect words developing over years. 2) Even wider and immediately practicable applications of the Corpus can be seen within translator training sphere. It can be used as material in post-editing classes and in the translation studies courses where students practice to spot mistakes, explain their nature and offer other variants as well as to compare different ways of expressing the same idea. It also provides ample material to develop learners insight into the much-discussed issue of the quality in translation. It helps to teach the difference between standard and individual translational problems and generalize about ways of overcoming them. On the basis of the Corpus a methodologist can also make didactic conclusions and target mistakes typical for Russian students in their training, including targeting language mistakes induced by source texts and revealed through the comparative analysis of the translations and original target language texts. Further research The immediate further research goals include expanding the scopes of the Corpus towards the planned 10 min words and we plan to use crowd sourcing power for that. Once we make the searchable Corpus available on-line we plan to install an interface option for those who would want to donate their translations to the project or take part in it by helping to spot and delete alignment mistakes. Secondly the corpus will be morphologically and syntactically marked-up. And last but not least and it is my personal aspiration of a teacher - we plan''' to develop a system of mistakes in translation that '''can be used for XML-based descriptive linguistic mark-up of mistakes in translation. The latter can be assigned a relative value which can be helpful in assessing translations. Необязательное не могу я писать и не делать ссылки и комментарии, а такой возможности я здесь не вижу. Итак. Может заменить это "about 2000 both source texts and translations" на сколько оригиналов и переводов - будет видно что это не 1:1 ratio, в устной речи - без деления по языкам, а на слайде с таковым. И я могу прочитать этот текст за 8 минут :-) Прежде всего выкинуть о том, что сейчас доступно on-line! доп инфо limitations Категория:RLPC